1 Installation

Follow the instructions on the Hades website on how to set up R and how to install HADES.

While this tutorial does not use many of the Hades packages, these are often quite useful to have. This tutorial does require DatabaseConnector and SqlRender from Hades, along with keyring, dplyr, kableExtra, DT and their dependencies. These can be installed directly from CRAN.

install.packages(c("SqlRender", "DatabaseConnector",
                   "keyring", "dplyr", "kableExtra"))

We will use keyring to hide database credentials. The following codes only needs to be executed once per machine.

keyring::key_set_with_value("strategusTutorialUser", password = "<FILL-IN>")
keyring::key_set_with_value("strategusTutorialPassword", password = "<FILL-IN>")
keyring::key_set_with_value("strategusTutorialServer", password = "<FILL-IN>")

Note that keyring requires libsodium that is sometimes not available.
Alternatively, special values can be held in as environmental variables (in eg .Renviron) and accessed via Sys.getenv("strategusTutorialUser").

2 Access and schema

We hold Strategus results on the OHDSI public database server shinydb. The web-page at https://bit.ly/strategus_data_model describes the data models for Strategus modules and some instructions on how to access results in these models. Our database server runs postgresql. You can install the postgresql JDBC drivers; this only needs to be done once.

It is recommend to specify the environmental variable DATABASECONNECT_JAR_FOLDER=<folder-of-your-choice> in .Renviron located in the user’s home directory. In the example below, however, we set the environmental variable manually in R to make this document more self-contained.

Sys.setenv(DATABASECONNECTOR_JAR_FOLDER = getwd())
DatabaseConnector::downloadJdbcDrivers(dbms = "postgresql")

and set-up connection details, in this case using keyring to hide usernames and passwords for security`

# OHDSI shinydb read-only credentials
connectionDetails <- DatabaseConnector::createConnectionDetails(
  dbms = "postgresql",
  server = keyring::key_get("strategusTutorialServer"),
  user = keyring::key_get("strategusTutorialUser"),
  password = keyring::key_get("strategusTutorialPassword"))

Table 2.1 provides the schema names for some Strategus results.

Table 2.1: Strategus study schema
study schema
GLP1RA vs DPP4I tutorial strategus_tutorial
Fluoroquinolone and aortic aneurysms quinoloneaa
Intravitreal anti-VEGF and kidney failure antivegfkf

3 Retrieving database meta-data

We will use the Hades packages DatabaseConnector and SqlRender to query the results tables directly in R.
Many joins and column-select can be done either on the server-side (postgresql) or client-side (R). We prefer the server-side when accessing large tables. On the client-side, dplyr is effective.

Here is an example of a basic query to database_meta_data to return data-source names and abbreviations (with client-side select):

schema <- "strategus_tutorial"

connection <- DatabaseConnector::connect(connectionDetails = connectionDetails)

sql <- SqlRender::render(sql = "
  SELECT * FROM @schema.database_meta_data",
  schema = schema) # parameterized
                        
metaData <- DatabaseConnector::querySql(connection = connection,
                                        sql = sql,
                                        snakeCaseToCamelCase = TRUE)

DatabaseConnector::disconnect(connection)

names(metaData)
##  [1] "cdmSourceName"                "cdmSourceAbbreviation"       
##  [3] "cdmHolder"                    "sourceDescription"           
##  [5] "sourceDocumentationReference" "cdmEtlReference"             
##  [7] "sourceReleaseDate"            "cdmReleaseDate"              
##  [9] "cdmVersion"                   "cdmVersionConceptId"         
## [11] "vocabularyVersion"            "databaseId"                  
## [13] "maxObsPeriodEndDate"
databaseName <- metaData %>% select(cdmSourceAbbreviation, databaseId)
databaseName

Now is a good time to (manually) change the abbreviations in databaseName to consistent, similar-length, reader-friendly choices for our manuscript(s).

databaseName$shortDatabaseName <- c("CCAE", "JMDC", "MDCR", "OEHR")

Likewise, we retrieve reader-friendly cohort names. Note that the following codes does not return negative control cohort names; these are not (?) currently stored in the results-set.

connection <- DatabaseConnector::connect(connectionDetails = connectionDetails)
sql <- SqlRender::render(sql = "
  SELECT DISTINCT cohort_id, cohort_name FROM @schema.cg_cohort_generation",
  schema = schema)
                        
cohortName <- DatabaseConnector::querySql(connection = connection,
                                            sql = sql,
                                            snakeCaseToCamelCase = TRUE) 

DatabaseConnector::disconnect(connection)

Often these names are unusable for reader-friendly documents. See, e.g., cohortName[1, "cohortName"] which is [OHDSItutorial] GLP1RA exposures 60-day eras - in cohorts: (19022) starts within D: -99999 - D: 0 of cohort start and ends D: 0 - D: 99999 of cohort start, first ever occurence with at least 365 days prior observation and 1 days follow up observation, males, females, occurs after 2017-12-01 and before 2023-12-31. We will re-code these names for prettier output. Here we demonstrate DT::datatable that provides an interactive (HTML-only) table renderer useful for web-pages and presentations.

cohortName$shortCohortName <- c("GLP1RA", "AMI/inpat", "AMI/all", "Diarrhea", "DPP4I/ex",
                                "T2DM/restrict", "T2DM/all", "GLP1RA/ex", "DPP4I") 
                               

DT::datatable(cohortName)

4 Basic results queries

Here is an example of a basic query to a cohort_counts table to return cohort sizes. We then output the counts using a well-formatted table

connection <- DatabaseConnector::connect(connectionDetails = connectionDetails)
sql <- SqlRender::render(sql = "
  SELECT database_id, cohort_id, cohort_subjects FROM @schema.cg_cohort_count",
  schema = schema)
                        
cohortCount <- DatabaseConnector::querySql(connection = connection,
                                            sql = sql,
                                            snakeCaseToCamelCase = TRUE) %>% 
  inner_join(databaseName, by = "databaseId") %>% 
  inner_join(cohortName, by = "cohortId") %>%
  select(shortCohortName, shortDatabaseName, cohortSubjects) %>%
  rename(cohortName = shortCohortName, databaseName = shortDatabaseName) %>%
  arrange(cohortName,  databaseName)

DatabaseConnector::disconnect(connection)

4.1 Hints for beautiful tables

The kableExtra package (especially when printed to PDF) makes beautiful tables.

table <- cohortCount %>% filter(cohortName %in% c("GLP1RA", "DPP4I")) %>%
  rename(Cohort = cohortName, Database = databaseName, Subjects = cohortSubjects) %>%
  mutate(Subjects = prettyNum(Subjects, 
                              format = "d", 
                              big.mark = ",")) %>% # Can easily change for _Lancet_
  tidyr::spread(Cohort, Subjects)

tab <- table %>% kbl(align = "lrr") %>% kable_classic(full_width = F) %>%
  add_header_above(c(" " = 1, "Exposure" = 2)) %>%
  row_spec(0, align = "c")

if (knitr::is_latex_output()) {
  tab %>% kable_styling(latex_options = "striped",
                          font_size = latex_table_font_size)
} else {
  tab ## bootstrap-striped is not nicely compatible with `add_header_above`
}
Exposure
Database DPP4I GLP1RA
CCAE 89,137 306,463
JMDC 82,051 15,381
MDCR 13,715 28,520
OEHR 279,475 459,266

Often it is most convenient to hand-craft latex table headers and footers and cat this material directly to the output. Here is a silly example.

# Requires "results='asis'"

library(xtable)
cat(gsub("\\\\", "\\", r"{
  \begin{table}[h]
  \begin{center}
  \begin{tabular}{lrr}
  \hline
  \multicolumn{1}{c}{$\alpha$} & \multicolumn{1}{c}{$\beta$} & \multicolumn{1}{c}{$\delta$} \\\\
}", fixed = TRUE))

print(xtable(table, format = "latex"), 
      include.rownames = FALSE, include.colnames = FALSE, only.contents = TRUE)

cat(gsub("\\\\", "\\", r"{
  \end{tabular}
  \end{center}
  \caption{A silly caption with \LaTeX commands}
  \end{table}
}", fixed = TRUE))

Note that if one abhors escaping backslash-characters, then one can read the header and footer from text files.

5 Working with CohortMethod

Unfortunately, we currently do not have a shared repository of functions to generate typical results figures and tables. Even OhdsiShinyModules hides these tools inside inaccessible shiny-functions. Until we solve this limitation, we recommend duplicating code from previous public manuscripts. The LegendT2dmEvidenceExplorer from LEGEND-T2DM is a good example: getCovariateBalance()

5.1 Query the design choices and comparisons

connection <- DatabaseConnector::connect(connectionDetails)

sql <- paste0("SET search_path TO ", schema, ";")
DatabaseConnector::executeSql(connection = connection, sql = sql)

sql <- "
  SELECT DISTINCT target_id, comparator_id,
    cm_analysis.analysis_id, cm_analysis.description
  FROM cm_covariate_balance
  INNER JOIN cm_analysis
  ON cm_analysis.analysis_id = cm_covariate_balance.analysis_id"

analysis <- DatabaseConnector::querySql(connection = connection,
                                        sql = sql,
                                        snakeCaseToCamelCase = TRUE)

DatabaseConnector::disconnect(connection)

datatable(analysis %>% arrange(analysisId, targetId, comparatorId))

5.2 Covariate balance

Shamelessly (until these functions are exported) stolen from OhdsiStudyModules/R:

# Stolen from `OhdsiShinyModules`
getCovariateBalance <- function(connectionHandler,
                                resultDatabaseSettings,
                                targetId,
                                comparatorId,
                                analysisId,
                                databaseId = NULL) {
  
      sql <- "
      SELECT
        cmscb.database_id,
        cmc.covariate_name,
        --cmc.analysis_id analysis_id,
        cmscb.target_mean_before before_matching_mean_treated,
        cmscb.comparator_mean_before before_matching_mean_comparator,
        abs(cmscb.std_diff_before) abs_before_matching_std_diff, --absBeforeMatchingStdDiff 
        cmscb.target_mean_after after_matching_mean_treated,
        cmscb.comparator_mean_after after_matching_mean_comparator,
        abs(cmscb.std_diff_after) abs_after_matching_std_diff
      FROM
        @results_schema.@cm_table_prefixshared_covariate_balance cmscb 
        JOIN @results_schema.@cm_table_prefixcovariate cmc ON cmscb.covariate_id = cmc.covariate_id AND cmscb.analysis_id = cmc.analysis_id AND cmscb.database_id = cmc.database_id -- database_id optional
       -- JOIN @results_schema.@cm_table_prefixcovariate_analysis cmca ON cmca.analysis_id = cmc.analysis_id  -- question: shouldn't we have a covariate_analysis_id in @table_prefixcovariate table?
      WHERE
        cmscb.target_id = @target_id
        AND cmscb.comparator_id = @comparator_id
        AND cmscb.analysis_id = @analysis_id
        --AND cmc.covariate_analysis_id = @covariate_analysis_id
        AND cmscb.database_id = '@database_id'
    "
    
  sql <- SqlRender::render(sql,
                           target_id = targetId,
                           comparator_id = comparatorId,
                           database_id = databaseId,
                           results_schema = schema,
                           cm_table_prefix = "cm_",
                           analysis_id = analysisId)
                           
  
  result <- DatabaseConnector::querySql(connection, sql,
                                         snakeCaseToCamelCase = TRUE)
  
  return(result)
}

connection <- DatabaseConnector::connect(connectionDetails)

balance <- getCovariateBalance(connection, schema,
                               targetId = 19137001,     # GLP1RA
                               comparatorId = 19021001, # DPP4I
                               analysisId = 1,
                               databaseId = 1902274292) # CCAE

DatabaseConnector::disconnect(connection)

datatable(balance %>% select(-databaseId),
          colnames = c("Name", "bT", "bC", "bSD", "aT", "aC", "aSD"),
          rownames = NULL)

It remains straight-forward to adapt existing plots or generate one’s own.

plot(balance$absBeforeMatchingStdDiff, balance$absAfterMatchingStdDiff,
     xlim = c(0, 0.4), ylim = c(0, 0.4),
     xlab = "Before", ylab = "After")
abline(h = 0.1, lty = 2)

When saving figures to disk, use a vector-graphic format (e.g., PDF), as these formats can scale to any size without resolution loss. For R-base plotting, the relevant functions are pdf() and dev.off(). For gg plotting, use ggsave(filename = "<NAME>.pdf").

5.3 Effect estimates

connection <- DatabaseConnector::connect(connectionDetails)

sql <- paste0("SET search_path TO ", schema, ";")
DatabaseConnector::executeSql(connection = connection, sql = sql)

cmResult <- DatabaseConnector::querySql(connection = connection,
                                        sql = "SELECT * FROM cm_result",
                                        snakeCaseToCamelCase = TRUE)

cmDiagnostics <- DatabaseConnector::querySql(connection = connection,
                                        sql = "SELECT * FROM cm_diagnostics_summary",
                                        snakeCaseToCamelCase = TRUE)

DatabaseConnector::disconnect(connection)

subset <- cmResult %>%
  filter(outcomeId %in% c(19023, 19024, 19059),           # All outcomes
         targetId == 19137001, comparatorId == 19021001,  # GLP1RA vs DPP4I
         analysisId == 1) %>%
  inner_join(cmDiagnostics %>% 
               select(databaseId, analysisId, targetId, comparatorId, outcomeId, unblind)) %>%
  inner_join(databaseName %>% 
               select(databaseId, shortDatabaseName) %>%
               rename(databaseName = shortDatabaseName) %>%
               mutate(minorOrder = row_number()))

# Uses a custom function from the LEGEND-T2DM manuscript package   
makePrettyFigure(subset %>% 
                   mutate(majorOrder = 1,
                          pass = unblind,
                          cumMajorSkip = 0),
                 fileName = "StrategusFigure2.pdf",
                 primaryAnalysisId = 1,
                 outcomeIds = c(19023, 19024, 19059),
                 outcomeNames = c("AMI/inpat", "AMI/all", "Diarrhea"))

Custom plotting function reporting hazard ratio (HR) and 95% confidence interval estimates across data sources.  Note the beautiful `wesanderson` inspired colors.

Figure 5.1: Custom plotting function reporting hazard ratio (HR) and 95% confidence interval estimates across data sources. Note the beautiful wesanderson inspired colors.